Downregulated NPAS4 in multiple brain regions is associated with major depressive disorder

Major Depressive Disorder (MDD) is a commonly observed psychiatric disorder that affects more than 2% of the world population with a rising trend. However, disease-associated pathways and biomarkers are yet to be fully comprehended. In this study, we analyzed previously generated RNA-seq data across seven different brain regions from three distinct studies to identify differentially and co-expressed genes for patients with MDD. Differential gene expression (DGE) analysis revealed that NPAS4 is the only gene downregulated in three different brain regions. Furthermore, co-expressing gene modules responsible for glutamatergic signaling are negatively enriched in these regions. We used the results of both DGE and co-expression analyses to construct a novel MDD-associated pathway. In our model, we propose that disruption in glutamatergic signaling-related pathways might be associated with the downregulation of NPAS4 and many other immediate-early genes (IEGs) that control synaptic plasticity. In addition to DGE analysis, we identified the relative importance of KEGG pathways in discriminating MDD phenotype using a machine learning-based approach. We anticipate that our study will open doors to developing better therapeutic approaches targeting glutamatergic receptors in the treatment of MDD.


Results
We combined RNA-seq datasets from three different sources 4,7,15 containing sequenced brain tissue samples from postmortem control and major depression patients to identify statistically significant transcriptional changes.We analyzed the raw RNA sequencing reads and measured the expression levels of genes for each sample.The quality of each sample was assessed, and a few samples were discarded from the analysis due to having low quality (see "Methods").Then, we followed the general pipeline of RNA-seq data analysis (see "Methods") by performing alignment to the human genome and counting the reads aligned with each gene.We grouped the counts according to the brain region they belonged to and identified genes that are differentially expressed relative to the control group (padj < 0.05) for each region by using the DESeq2 R package 16 .We did not apply any log-fold change cut-off to our analysis.
To investigate the potential transcriptional similarities between different brain regions, we first calculated pairwise Spearman's correlations using log2FC values of commonly expressed genes (Fig. 1A).No strong correlation was observed between the two regions.The strongest correlation was observed between the orbitofrontal cortex and ventral subiculum with pairwise Spearman's correlation score of 0.28.Therefore, we can conclude that different disease-related signatures were observed in different brain regions.
Then, we focused on genes that are differentially expressed for each region independently.Out of the seven regions, we identified at least one differentially expressed gene in DLPFC, nACC, and vSUB but not in other brain regions.The highest number of differentially expressed genes was observed in DLPFC (sample size of n = 150) with 87 differentially expressed genes, and this was followed by nACC (n = 94) with six genes and vSUB (n = 43) with two genes (Fig. 1B).When we intersected lists of differentially expressed genes for these three regions, we discovered that a brain-specific transcription factor NPAS4 17 was the only common gene (Fig. 1b) that was downregulated in all three regions.It was previously shown in mice [18][19][20][21] and in a study monitoring 152 ischemic stroke patients 22 that decrease in NPAS4 expression is correlated with the MDD phenotype.Because NPAS4 was identified as the single common downregulated gene, we aimed to further investigate the shared transcriptional profile between different regions.Therefore, we combined samples from three regions (DLPFC, nACC, and vSUB) which we observed differential gene expression and reached a sample size of 287 (143 CTRL, 144 MDD) to perform a DGE analysis by adding a covariate of "brain region" to eliminate regionspecific variations in gene expression.As presented in the volcano plot (Fig. 2A), 149 genes were found to be differentially expressed (padj < 0.05) with a general trend of downregulation.We suggest that this was mainly due to the top three (padj:2 × 10 -27 , 4.9 × 10 -11 , 2.3 × 10 -8 ) downregulated transcription factors (NPAS4, FOS, and FOSB) (Fig. 2B).
To gain more insight into the pathways involved in MDD phenotype, we performed a co-expression analysis using CEMiTool 23 for the brain regions we observed NPAS4 downregulation to reveal correlating gene modules.As an input, we used the same normalized count matrix for DGE analysis.The co-expression analysis yielded two co-expressed gene modules (padj < 0.1) as modules 1 and 2. After introducing sample annotations as MDD and control, we identified that both of the modules show positive enrichment in control patients and negative enrichment in MDD patients (Supplementary Fig. 1).For the first module (128 genes) control group had normalized enrichment score (NES) of 1.49 (padj = 0.048) and MDD group had -1.48 (padj = 0.036).Furthermore, for the second gene (60 genes) module control group had NES of 1.42 and MDD group had −1.44 (padj = 0.065).Overall, higher enrichment means a higher activity of the module for a given group and the opposite is true for the negatively enriched group.Because the activity of each module is correlated with the expression levels of the samples, we can conclude that co-expressed genes are mainly downregulated for patients with depression.Although we repeated the same analysis by including all samples, we couldn't identify functionally relevant co-expressing modules.Therefore, we can conclude that NPAS4 downregulation is likely to be the driving force behind the changes observed in co-expression modules.We further explored the functional implications of these modules to understand their possible disease relevance.
We performed gene set enrichment analysis through a web-based tool Enrichr [24][25][26] , and presented the top 10 KEGG (Kyoto Encyclopedia of Genes and Genomes) [27][28][29] pathways based on their combined score (Tables 1, 2 and www.nature.com/scientificreports/ 3) for differentially genes and co-expressed gene modules.Enrichment of differentially expressed genes yielded 18 pathways (padj < 0.05) related to inflammation, such as the IL17 signaling pathway, Rheumatoid arthritis, NFkappa B signaling pathway.It has been previously suggested that IL-17A induces depressive behavior in mice 30,31 , but studies including human subjects [32][33][34] have contradicting conclusions.Lui et al. 35 showed that higher serum levels of IL-17 are positively correlated with the severity of anxiety in patients with rheumatoid arthritis.The involvement of interleukins and cytokines was previously discussed numerous times [36][37][38] .However, the genes highlighted in both these studies and our own analysis lack a strong association with the disorder.Therefore, they should be mainly considered as biomarkers.We suggest that a single pathway might be insufficient to capture the full breadth of disease biology; instead, contrasting multiple pathways can provide a more holistic perspective on observed patterns, in our case inflammation related genes.It should be noted that 91 out of 147 differentially expressed genes (including NPAS4) are not present in any KEGG pathway.This suggests that these pathways should not be considered representatives of all differentially expressed genes.www.nature.com/scientificreports/ We further investigated pathways enriched for individual co-expressed gene modules comparatively to reveal the significant patterns observed within modules.We called the first module as "glutamatergic signaling module" (Table 2) because we observed a strong enrichment for the addiction 39,40 glutamatergic synapse, and circadian entrainment 41,42 pathways that were mainly controlled by AMPA (α-amino-3-hydroxy-5-methyl-4isoxazolepropionic acid) and NMDA (N-methyl-D-aspartate) glutamate receptor activity.To visualize the important co-expressed genes in pathways, we constructed a network visualized in Fig. 2C.In this visual presentation, we added edges between genes based on their co-occurrence in top 9 KEGG.Here, genes with high co-occurrence are clustered closely.Notably, glutamate receptors GRIN1, GRIN2B, GRIA2, Ca 2+ dependent CALM3, CAMK3B, PRKCB, ADCY5, and G proteins GNAO1, GNG7 are form the core of this network.Lastly, we called the second module "synaptic vesicle and secretion module" (Fig. 2D and Table 3) because it contained genes ATP2B2 responsible for Ca +2 secretion, ATP1A3, ATP1A1 responsible for Na + /K + transport, CAMK2A associated with Ca +2 , GNAS and ADCY1 associated with G proteins.Therefore, the synaptic vesicle cycle, different secretionrelated pathways, and pathways related to calcium transport are highly enriched.Negative enrichment scores of these two modules for the MDD suggest that glutamatergic signaling activity is downregulated for the brain regions where we observed NPAS4 as a common downregulated gene.This suggests a sequence of events leading NPAS4 downregulation.Previous research has highlighted the significance of both glutamatergic signaling and Ca 2+ release in regulating NPAS4 induction 43 .Our findings postulate that the downregulation of glutamatergic pathways may trigger a dysregulation in Ca 2+ transport resulting in to decreased NPAS4 expression and activity.NPAS4 has been involve in controlling the expression of genes linked to glutamatergic synapses, suggesting a reciprocal relationship between NPAS4 and these synapses 44 .The downregulation of these genes and pathways can result in reduced synaptic plasticity leading MDD 45 .Interestingly, when we extended the co-expression analysis across all brain regions in our study, we failed to reveal a significant pathway enrichment, reinforcing the unique association between NPAS4 and the identified pathways.
While DGE and co-expression analyses provided important insights about the changes in observed MDD patients, they are specifically designed to identify linear associations observed in expression data for pre-determined conditions (e.g., disease and control).However, in reality, disease biology can be much more complex that these analyses may not capture the true essence of the changes especially for the diseases such as MDD.Thus, in addition to DGE and co-expression analyses, we adopted a machine learning-based approach called multiple kernel learning (MKL).This method is specifically designed to capture non-linear relationships between genes and gene groups, helping identify disease-associated biological mechanisms.Notably, the same computational framework was shown to be successful in identifying features that predict cancer stages 46 and the survival of individuals 47 .In our analysis, KEGG pathways were used to identify the informative gene groups to discriminate MDD patients from the control group.In this method, each pathway was mapped to a gene expression matrix, and distinct kernel matrices were calculated for each pathway.Using the optimized weighted combination of these kernel matrices, the algorithm finds a sparse set of pathways by discarding uninformative ones from the collection.We can infer the relative importance of the pathways by considering their resulting kernel weights.We used the normalized gene expression values from all brain regions and samples (n = 457) to identify the common underlying biological mechanisms associated with MDD.
We reported the area under the receiver operating characteristic curve (AUC) values over 100 replications to evaluate the algorithm's performance.The predictive performance of the MKL algorithm is increased when we included samples from all regions compared to three regions containing differentially expressed genes (Fig. 3A) indicating that including more brain regions and samples in the analysis increases the reliability of the prediction model.We achieved an average AUC score of 0.84 with a standard deviation of 0.04 for the model including all brain regions (Fig. 3A).21 pathways were selected as informative, at least in 50 replications (Fig. 3B).Pathways "Linoleic acid metabolism, " "Viral protein interaction with cytokine and cytokine receptor, " "Olfactory transduction, " "Staphylococcus aureus infection, " "Chemical carcinogenesis-DNA adducts, " and "Graft-versus-host disease" were selected as informative in all replicates.Because some of the chosen pathways do not directly relate to brain tissue, we would like to elaborate on the results by categorizing them based on the gene groups they share.Thus, we divided pathways into two main categories based on their functional relevance and gene composition.The first cluster contained eight pathways (Linoleic acid metabolism, Chemical carcinogenesis-DNA adducts, Ovarian steroidogenesis, Primary bile acid biosynthesis, Fat digestion and absorption, Maturity onset diabetes of the young, Metabolism of xenobiotics by cytochrome P450, Retinol Metabolism, and Drug metabolism-cytochrome P450) containing genes related to synthesis, absorption, and metabolism of lipids.In this group, genes related to the cytochrome p450 (CYP) family are abundant and shared between different pathways.Previous studies have focused on variants in CYP genes and their association with SSRI metabolism and the effectiveness of the treatment [48][49][50][51] On the other hand, our approach puts forward the idea that they can be used for diagnosis."Nitrogen metabolism" and "Maturity onset diabetes of the young" can also fit in this category because they are related to metabolism.Several studies 52,53 demonstrate the role of metabolism in patients with MDD.The second major group contained five pathways (Viral protein interaction with cytokine and cytokine receptor, Staphylococcus aureus infection, Graft-versus-host disease, Hematopoietic cell lineage, and Cytokine-cytokine receptor interaction) related to inflammation and immune system which is parallel to the enrichment of differentially expressed genes that we identified.The remaining four pathways were related to perceiving external stimuli through receptors (Olfactory transduction, Neuroactive ligand-receptor interaction, and Phototransduction) and glycosylation (Mucin type O-glycan biosynthesis).Overall, using KEGG pathways as features, we discriminated against MDD patients with high accuracy.The pathways we identified as discriminative can serve as a starting point for the research on MDD diagnosis.

Discussion
Our study combined multiple publicly available RNA-Seq datasets to identify novel pathways and genetic markers associated with MDD.A large sample size increased the sensitivity of the analysis, which led to the discovery of novel gene-disease associations.On the other hand, combining datasets from different sources introduces a certain amount of noise to the analysis.Moreover, Brodmann areas, individual segments of the cerebral cortex defining boundaries of each brain region, for a given region might slightly differ between studies.Therefore, we performed a preliminary quality filtration step and used the "study" covariate in our DGE analysis to eliminate some of that noise.
Our results show that the dorsolateral prefrontal cortex is the most affected region based on the number of differentially expressed genes, and downregulation of NPAS4 is observed for multiple brain regions.It should be highlighted that the larger change observed in DLPFC can also be attributed to its larger sample size.It has been previously demonstrated that NPAS4 plays a role in memory 54 , modulating inhibitory-excitatory balance [55][56][57] , epileptogenesis in mice, cocaine-induced hyperlocomotion 43 , cognitive well-being and many other diseases 19,58,59 .While the association between NPAS4 and MDD has been shown in mice previously 20 , we validated the same relationship for humans and multiple brain regions.Supporting our findings, Gu et al. showed that patients with post-stroke depression had lower expression levels of NPAS4 in their peripheral blood mononuclear cells 22 , which makes NPAS4 a potential diagnostic biomarker in the future.Our study suggests the central role of NPAS4 in major depression as an association factor.Although this study suggests a potential causation role of NPAS4 in the downregulation of synaptic plasticity in MDD, this hypothesis needs to be tested experimentally in model species.
To highlight the role of NPAS4 and understand that the relationship between differentially genes and coexpressed gene modules, we gathered our findings into an MDD model (Fig. 4).In this model, we combined our findings with the existing literature on connections between genes and pathways.In our model, we show that NMDA and AMPA glutamate receptors can induce the expression of NPAS4 and other IEGs through controlling Ca 2+ influx 60,61 .Previous research points to a synergistic interaction between glutamatergic synapses and NPAS4, where both amplify each other's activity to increase synaptic plasticity 43,44 .ChIP-seq enhancer data of NPAS4 within mouse cortical neurons show that NPAS4 regulates immediate early genes 62 .Differentially expressed genes and this experiment include IEGs in common; FOS, FOSB, NR4A1, NR4A3, JUNB, and NPAS4 itself.ChIP-seq data of NPAS4 embryonic mouse 14 days medial ganglionic eminence (mostly containing excitatory neurons) and cortex (mostly inhibitory neurons) shows that NPAS4 regulates distinct sets of late-response genes in inhibitory and excitatory neurons 57 .Genes identified in our DGE analysis PTGS2, ATF3, ETV3, and CSRPN1 which were also regulated by NPAS4 in inhibitory neurons.By controlling the expression of other IEGs, which are also transcription factors, NPAS4 indirectly regulates the expression of many different genes as a master transcription factor controlling synaptic plasticity.Supporting our model, existing studies have shown that FOS, FOSB, and their splice variants [63][64][65] are associated with motivation and depressive behavior 66 .Also, some antidepressants have been shown to increase the expression of FOS 67 and NPAS4 68 .These immediate early genes play important roles in maintaining essential synaptic functions [69][70][71] .In our analysis, we observed a significant downregulation trend in pathways regulated by glutamatergic receptors.These pathways influence IEGs that regulate synaptic plasticity, circadian entrainment [72][73][74] , and learning abilities (long-term potentiation) in patients with MDD.
Building on the insights from our model which highlights the significance of glutamatergic receptors in the context of MDD, it's worth revisiting existing therapeutic strategies.Traditionally, the focus has been on aminergic receptors, especially serotonin and dopamine 75 .However, these monoamine-oriented treatments have been ineffective, especially for patients with treatment-resistant depressions 76,77 .Our study suggests that glutamatergic receptors can be used as drug targets in the treatment of MDD, aiming to restore the lost synaptic plasticity.In support of our hypothesis, NMDA antagonist ketamine and its enantiomer esketamine have been shown to be effective for patients with treatment resistant MDD 76,[78][79][80] .Although esketamine is an antagonist of the NMDA receptor, it leads to the activation of AMPA receptors 81 that increase synaptic plasticity.Furthermore, the trial of NMDA co-agonist glycine triggered depressive symptoms in mice 82 .Thus, we conclude that the results of these drug trials align with the model we proposed in this study.We anticipate that antidepressants targeting glutamatergic signaling pathways will gain more popularity.

Data analysis
Quality trimming FASTQC 0.11.7 was used to check the quality of each sample.We eliminated some of the samples directly from the analysis due to having very low quality in general.For the samples having low quality towards the 3' end, we

Read count
HTSeq 87 was used to obtain read counts for each patient.The distribution of counts for each region is given in Fig. 1.Ensembl GRCh37 annotation list was used as a reference.

Differential gene expression analysis
Differential gene expression analysis based on the negative binomial distribution was performed in R with DESeq2 package 16 .Genes that significantly differentially expressed (adjusted p-value < 0.05) between major depressive patients and the control group were identified regarding sex, age, study and brain region that the sample is obtained from, and postmortem interval covariates (full model, design ~ sam-pleDataset + sampleGender + sampleAge + PMI + brainRegion + condition; brain region-specific model, design ~ sampleDataset + sampleGender + sampleAge + PMI + condition).

Co-expression analysis
R package CEMiTool 23 was used to perform co-expression analysis.Normalized count data from DLPFC, vSUB and nACC were included in the analysis.Variance stabilizing transformation was not applied before filtering the genes and default filtering p-value was used (0.1).Label of each sample was provided to obtain normalized enrichment scores for each of the modules in control group and MDD patients.

Identification of MDD-associated pathways using MKL algorithm
A multiple kernel learning (MKL)-based machine learning approach 46 was used to identify informative pathways in discriminating MDD patients.Instead of first identifying the expressed genes and then performing a gene set enrichment analysis using these selected genes, the proposed MKL-based algorithm considers whole expression matrix and each pathway from the given collection at the same time.In this method, each pathway is mapped to a different kernel function using the expression profiles of the genes in the given pathway.Kernel functions are defined as the similarity measures between pairs of samples, and it is known that weighted combination of several kernel functions (i.e., MKL) increases the predictive ability of the kernel-based methods 88 .At the end, the proposed method converges to a solution where kernels with non-zero weights are included in the final model for the classification.We considered that a pathway is selected to be used in the final model if the corresponding kernel weight was greater than 0.01.The experimental setting that we used in machine learning model is as follows.We split our dataset by randomly picking 80% as training and 20% as test set.While splitting the data, we kept the ratio between the control group and MDD patients same in the training and test partitions.We repeated this procedure 100 times to obtain more robust performance measures and reported the experimental results over these 100 replications.We performed fourfold inner cross-validation for selecting the model parameters (i.e., regularization parameter C).Since the gene expression is a normalized count data, we first log2-transformed our dataset.Following that, we further normalized the training set to have zero mean and unit standard deviation, while we normalized the test set using the mean and the standard deviation of the original training set.We followed the same computational setting as proposed in 46 to obtain the relative importance of pathways.

Figure 1 .
Figure 1.Differential gene expression analysis for different brain regions (A) Spearman's correlation of log2FC values between investigated brain regions.(B) Venn diagram showing the number of differentially expressed genes for DLPC, vSUB, and nACC.

Figure 2 .
Figure 2. DGE and co-expression analysis of DLPFC, vSUB and nACC (A) Volcano plot for the DGE analysis of three regions.(B) Box plots for the top three differentially expressed genes NPAS4, FOSB, and FOS.(C, D) Top genes in glutamatergic signaling and synaptic vesicle cycle, and secretion co-expression modules enriched in KEGG pathways.Edges indicate co-occurrence in the same pathway.Higher edge width indicates higher co-occurrence.Force directed layout is used for visualization.

Figure 3 .
Figure 3. Multiple kernel learning results.(A) Area under curve comparison of multiple kernel learning for 100 replications.(B) Pathways selected as discriminative in 100 replication more than 50 times.

Figure 4 .
Figure 4. Summary of differentially and co-expressed genes and the enriched pathways.

Table 1 .
KEGG 2021 pathway enrichment for the differentially expressed genes.Overlap: The overlap between the gene set and the pathway.

Table 4 .
Demographics of study groups.

Table 5 .
Distribution of samples by brain regions and the study groups that they belong to.